Prediction of autistic tendencies at 18 months of age via markerless video analysis of spontaneous body movements in 4-month-old infants

Early intervention is now considered the core treatment strategy for autism spectrum disorders (ASD). Thus, it is of significant clinical importance to establish a screening tool for the early detection of ASD in infants. To achieve this goal, in a longitudinal design, we analyzed spontaneous bodily movements of 4-month-old infants from general population and assessed their ASD-like behaviors at 18 months of age. A total of 26 movement features were calculated from video-recorded bodily movements of infants at 4 months of age. Their risk of ASD was assessed at 18 months of age with the Modified Checklist for Autism in Toddlerhood, a widely used screening questionnaire. Infants at high risk for ASD at 18 months of age exhibited less rhythmic and weaker bodily movement patterns at 4 months of age than low-risk infants. When the observed bodily movement patterns were submitted to a machine learning-based analysis, linear and non-linear classifiers successfully predicted ASD-like behavior at 18 months of age based on the bodily movement patterns at 4 months of age, at the level acceptable for practical use. This study analyzed the relationship between spontaneous bodily movements at 4 months of age and the ASD risk at 18 months of age. Experimental results suggested the utility of the proposed method for the early screening of infants at risk for ASD. We revealed that the signs of ASD risk could be detected as early as 4 months after birth, by focusing on the infant’s spontaneous bodily movements.


Supplementary
This feature is defined as the proportion of the number of frames with bodily movement between the first and S-th sub-segments using the following equations: where L is the total number of frames in each sub-segment, and M th is the threshold for determining whether movement has occurred. In this paper, we set M th = 0.005.

Movement strength (I 2 )
This feature is defined as the magnitude of movement per unit time between the first and S-th segments using the following equations: where L is the total number of frames in which (A k ) M l is M th or more between the first and S-th segments.

Movement count (I 3 )
This feature is defined as the number of continuous movement detections, divided by the total number of frames L: where (A k ) Q s represents the number of movements detected in the s-th video sub-segment. A movement is detected when (A k ) M l becomes (A k ) M l ≥ M th once and then falls below M th again.

Ratio of Movement Frequency (I 4 )
This feature is defined as the ratio of the movement frequency between the bodily regions A k1 and A k2 :

Ratio of Movement Strength (I 5 )
This feature is defined as the ratio of the movement strength between the bodily regions A k1 and A k2 :

Movement Coordination (I 6 )
Movement coordination between the bodily regions A k1 and A k2 is calculated as the correlation coefficient between the time-series data of (A k 1 ) M l and (A k 2 ) M l . Correlation coefficients were computed within sliding temporal windows with a length of 300 frames. The stride of the sliding window was one frame. Correlation coefficients were then averaged to yield a single feature of the movement coordination.

Movement Rhythm
Central Frequency (I 7 ) and Second Moment around Central Frequency (I 8 ) of Motor Alteration The time-series data of (A k ) M l of each video sub-segment were analyzed in the frequency domain. Within each video sub-segment, the time-series data within a sliding window with the length of 128 frames were subjected to the fast Fourier transformation. The stride of the moving window was one frame. Then, the average power spectrum density (PSD), P (f ), was computed by grand-averaging PSDs across all the sliding widows in all the video sub-segments. Based on P (f ), central frequency (F cntr ) and second moment around the central frequency (D cntr ) were computed according to the following equations: where f max = 15 Hz is the maximal frequency range for analysis.

Movement of the Body Center
Variation in the Body Center Velocity (I 13 ) The average absolute values of the body center velocity along the x-(G v l,x ) and y-axes (G v l,y ) were computed using the following equations:

Standard Deviation of Body Center Fluctuation (I 14 )
The standard deviations of the body center fluctuations along the x-(G d l,x ) and y-axis (G d l,y ) within the sliding temporal window of length L g are calculated as follows: where j ∈ {x, y}, L g = 300 frames, and G d j represents the average value of G d l,j within the window. The σ j is calculated in all video sub-segments, in the same way; then, I 14i is calculated by averaging all σ j .

Area of Body Center Excursion (I 15 )
This feature represents the area surrounded by the outermost circumference of the trajectory of the body center excursion. First, in the sliding temporal window of length L g = 300 frames, the outermost circumference of the trajectory of (G d x , G d y ) are arranged in a clockwise direction based on the method proposed by Kim et al. [1] and defined as (G d b,x , G d b,y ) (b = 1, 2, . . . , B; B is the total number of points of the outermost circumference). The area surrounded by the outermost circumference is then calculated using the following equation: The objective variable of the GLM was the ordinal score of the M-CHAT items at 18 months (the number of failed items on the M-CHAT), and the explanatory variable was the 26 motor features obtained from the video at 4 months. Because the ordinal scores of the M-CHAT items can be regarded as count data with an upper limit, a binomial distribution was set for the GLM error structure, and a logit function was selected for the link function. Forward-backward stepwise model selection based on Akaike's information criterion (AIC) was performed to search for the best model for predicting the ordinal scores. The prediction accuracy of the selected model was evaluated based on leave-one-out cross-validation.

Results and discussion
Supplementary Table S3 shows the information on the best regression model obtained from the stepwise analysis. In total, ten features were selected as the best features for predicting the ordinal scores of the M-CHAT items. The results of the model accuracy evaluation by cross-validation showed a mean absolute error between the predicted and true scores of 0.459. The relationship between the true and predicted scores is shown in Supplementary Figure S1. A significant positive correlation was found between both scores (r = 0.6214; p < 0.0001) Effective movement features were almost consistent whether the ASD highand low-risk groups were compared dichotomously based on a cutoff criterion or the M-CHAT score itself was regressed. Specifically, among ten features selected by this analysis, the following eight features showed significant differences between the ASD high-and low-risk groups (see main text): (A6) I 2 , (A7) I 10y , (A7) I 11x , (A7) I 11y , (A7) I 12y , (A7) I 13y , (A7) I 14x , and (A7) I 15 . These results support the validity of the proposed video-based movement analysis method for assessing ASD-like behaviors in infants.